Search for: All records

Creators/Authors contains: "Wang, Jiyuan"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DuoReduce: Bug Isolation for Multi-Layer Extensible Compilation

Wang, Jiyuan; Qiu, Yuxin; Limpanukorn, Ben; Kang, Hong Jin; Zhang, Qian; Kim, Miryung (June 2025, FSE 2025)

In recent years, the MLIR framework has had explosive growth due to the need for extensible deep learning compilers for hardware accelerators. Such examples include Triton [39], CIRCT [14], and ONNX-MLIR [22]. MLIR compilers introduce significant complexities in localizing bugs or inefficiencies because of their layered optimization and transformation process with compilation passes. While existing delta debugging techniques can be used to identify a minimum subset of IR code that reproduces a given bug symptom, their naive application to MLIR is time-consuming because real-world MLIR compilers usually involve a large number of compilation passes. Compiler developers must identify a minimized set of relevant compilation passes to reduce the footprint of MLIR compiler code to be inspected for a bug fix. We propose DuoReduce, a dual- dimensional reduction approach for MLIR bug localization. DuoReduce leverages three key ideas in tandem to design an efficient MLIR delta debugger. First, DuoReduce reduces compiler passes that are irrelevant to the bug by identifying ordering dependencies among the different compilation passes. Second, DuoReduce uses MLIR-semantics-aware transformations to expedite IR code reduction. Finally, DuoReduce leverages cross-dependence between the IR code dimension and the compilation pass dimension by accounting for which IR code segments are related to which compilation passes to reduce unused passes. Experiments with three large-scale MLIR compiler projects find that DuoReduce outperforms syntax-aware reducers such as Perses and Vulcan in terms of IR code reduction by 31.6% and 21.5% respectively. If one uses these reducers by enumerating all possible compilation passes (on average 18 passes), it could take up to 145 hours. By identifying ordering dependencies among compilation passes, DuoReduce reduces this time to 9.5 minutes. By identifying which compilation passes are unused for compiling reduced IR code, DuoReduce reduces the number of passes by 14.6%. This translates to not needing to examine 281 lines of MLIR compiler code on average to fix the bugs. DuoReduce has the potential to significantly reduce debugging effort in MLIR compilers, which serves as the foundation for the current landscape of machine learning and hardware accelerators.
more » « less
Free, publicly-accessible full text available June 24, 2026
DuoReduce: Bug Isolation for Multi-layer Extensible Compilation

https://doi.org/10.1145/3715747

Wang, Jiyuan; Qiu, Yuxin; Limpanukorn, Ben; Kang, Hong Jin; Zhang, Qian; Kim, Miryung (June 2025, Proceedings of the ACM on Software Engineering)

In recent years, the MLIR framework has had explosive growth due to the need for extensible deep learning compilers for hardware accelerators. Such examples include Triton, CIRCT, and ONNX-MLIR. MLIR compilers introduce significant complexities in localizing bugs or inefficiencies because of their layered optimization and transformation process with compilation passes. While existing delta debugging techniques can be used to identify a minimum subset of IR code that reproduces a given bug symptom, their naive application to MLIR is time-consuming because real-world MLIR compilers usually involve a large number of compilation passes. Compiler developers must identify a minimized set of relevant compilation passes to reduce the footprint of MLIR compiler code to be inspected for a bug fix. We propose DuoReduce, a dual-dimensional reduction approach for MLIR bug localization. DuoReduce leverages three key ideas in tandem to design an efficient MLIR delta debugger. First, DuoReduce reduces compiler passes that are irrelevant to the bug by identifying ordering dependencies among the different compilation passes. Second, DuoReduce uses MLIR-semantics-aware transformations to expedite IR code reduction. Finally, DuoReduce leverages cross-dependence between the IR code dimension and the compilation pass dimension by accounting for which IR code segments are related to which compilation passes to reduce unused passes. Experiments with three large-scale MLIR compiler projects find that DuoReduce outperforms syntax-aware reducers such as Perses and Vulcan in terms of IR code reduction by 31.6% and 21.5% respectively. If one uses these reducers by enumerating all possible compilation passes (on average 18 passes), it could take up to 145 hours. By identifying ordering dependencies among compilation passes, DuoReduce reduces this time to 9.5 minutes. By identifying which compilation passes are unused for compiling reduced IR code, DuoReduce reduces the number of passes by 14.6%. This translates to not needing to examine 281 lines of MLIR compiler code on average to fix the bugs. DuoReduce has the potential to significantly reduce debugging effort in MLIR compilers, which serves as the foundation for the current landscape of machine learning and hardware accelerators.
more » « less
Free, publicly-accessible full text available June 19, 2026
Fuzzing MLIR Compilers with Custom Mutation Synthesis

https://doi.org/10.1109/ICSE55347.2025.00037

Limpanukorn, Ben; Wang, Jiyuan; Kang, Hong Jin; Zhou, Zitong; Kim, Miryung (April 2025, IEEE)

Compiler technologies in deep learning and domain-specific hardware acceleration are increasingly adopting extensible compiler frameworks such as Multi-Level Intermediate Representation (MLIR) to facilitate more efficient development. With MLIR, compiler developers can easily define their own custom IRs in the form of MLIR dialects. However, the diversity and rapid evolution of such custom IRs make it impractical to manually write a custom test generator for each dialect. To address this problem, we design a new test generator called SynthFuzz that combines grammar-based fuzzing with custom mutation synthesis. The key essence of SynthFuzz is two fold: (1) It automatically infers parameterized context-dependent custom mutations from existing test cases. (2) It then concretizes the mutation's content depending on the target context and reduces the chance of inserting invalid edits by performing k - ancestor and prefix/postfix matching. It obviates the need to manually define custom mutation operators for each dialect. We compare SynthFuzz to three baselines: Grammarinator-a grammar-based fuzzer without custom mutations, MLIRSmith-a custom test generator for MLIR core dialects, and NeuRI-a custom test generator for ML models with parameterization of tensor shapes. We conduct this comprehensive comparison on four different MLIR projects. Each project defines a new set of MLIR dialects where manually writing a custom test generator would take weeks of effort. Our evaluation shows that SynthFuzz on average improves MLIR dialect pair coverage by 1.75 ×, which increases branch coverage by 1.22 ×. Further, we show that our context dependent custom mutation increases the proportion of valid tests by up to 1.11 ×, indicating that SynthFuzz correctly concretizes its parameterized mutations with respect to the target context. Parameterization of the mutations reduces the fraction of tests violating the base MLIR constraints by 0.57 ×, increasing the time spent fuzzing dialect-specific code.
more » « less
Free, publicly-accessible full text available April 26, 2026
Leveraging Hardware Probes and Optimizations for Accelerating Fuzz Testing of Heterogeneous Applications

https://doi.org/10.1145/3611643.3616318

Wang, Jiyuan; Zhang, Qian; Rong, Hongbo; Xu, Guoqing Harry; Kim, Miryung (November 2023, ACM)

Full Text Available
HeteroGen: transpiling C to heterogeneous HLS code with automated test generation and program repair

https://doi.org/10.1145/3503222.3507748

Zhang, Qian; Wang, Jiyuan; Xu, Guoqing Harry; Kim, Miryung (February 2022, ASPLOS 2022: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems)

Despite the trend of incorporating heterogeneity and specialization in hardware, the development of heterogeneous applications is limited to a handful of engineers with deep hardware expertise. We propose HeteroGen that takes C/C++ code as input and automatically generates an HLS version with test behavior preservation and better performance. Key to the success of HeteroGen is adapting the idea of search-based program repair to the heterogeneous computing domain, while addressing two technical challenges. First, the turn-around time of HLS compilation and simulation is much longer than the usual C/C++ compilation and execution time; therefore, HeteroGen applies pattern-oriented program edits guided by common fix patterns and their dependences. Second, behavior and performance checking requires testing, but test cases are often unavailable. Thus, HeteroGen auto-generates test inputs suitable for checking C to HLS-C conversion errors, while providing high branch coverage for the original C code. An evaluation of HeteroGen shows that it produces an HLS-compatible version for nine out of ten real-world heterogeneous applications fully automatically, applying up to 438 lines of edits to produce an HLS version 1.63x faster than the original version.
more » « less
Full Text Available
QDiff: Differential Testing of Quantum Software Stacks

https://doi.org/10.1109/ASE51524.2021.9678792

Wang, Jiyuan; Zhang, Qian; Xu, Guoqing Harry; Kim, Miryung (November 2021, ASE '21: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering)

Over the past few years, several quantum software stacks (QSS) have been developed in response to rapid hardware advances in quantum computing. A QSS includes a quantum programming language, an optimizing compiler that translates a quantum algorithm written in a high-level language into quantum gate instructions, a quantum simulator that emulates these instructions on a classical device, and a software controller that sends analog signals to a very expensive quantum hardware based on quantum circuits. In comparison to traditional compilers and architecture simulators, QSSes are difficult to tests due to the probabilistic nature of results, the lack of clear hardware specifications, and quantum programming complexity. This work devises a novel differential testing approach for QSSes, named QDIFF with three major innovations: (1) We generate input programs to be tested via semantics-preserving, source to source transformation to explore program variants. (2) We speed up differential testing by filtering out quantum circuits that are not worthwhile to execute on quantum hardware by analyzing static characteristics such as a circuit depth, 2-gate operations, gate error rates, and T1 relaxation time. (3) We design an extensible equivalence checking mechanism via distribution comparison functions such as Kolmogorov-Smirnov test and cross entropy. We evaluate QDiff with three widely-used open source QSSes: Qiskit from IBM, Cirq from Google, and Pyquil from Rigetti. By running QDiff on both real hardware and quantum simulators, we found several critical bugs revealing potential instabilities in these platforms. QDiff's source transformation is effective in producing semantically equivalent yet not-identical circuits (i.e., 34% of trials), and its filtering mechanism can speed up differential testing by 66%.
more » « less
Full Text Available
HeteroFuzz: fuzz testing to detect platform dependent divergence for heterogeneous applications

https://doi.org/10.1145/3468264.3468610

Zhang, Qian; Wang, Jiyuan; Kim, Miryung (August 2021, ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering)

As specialized hardware accelerators like FPGAs become a prominent part of the current computing landscape, software applications are increasingly constructed to leverage heterogeneous architectures. Such a trend is already happening in the domain of machine learning and Internet-of-Things (IoT) systems built on edge devices. Yet, debugging and testing methods for heterogeneous applications are currently lacking. These applications may look similar to regular C/C++ code but include hardware synthesis details in terms of preprocessor directives. Therefore, their behavior under heterogeneous architectures may diverge significantly from CPU due to hardware synthesis details. Further, the compilation and hardware simulation cycle takes an enormous amount of time, prohibiting frequent invocations required for fuzz testing. We propose a novel fuzz testing technique, called HeteroFuzz, designed to specifically target heterogeneous applications and to detect platform-dependent divergence. The key essence of HeteroFuzz is that it uses a three-pronged approach to reduce the long latency of repetitively invoking a hardware simulator on a heterogeneous application. First, in addition to monitoring code coverage as a fuzzing guidance mechanism, we analyze synthesis pragmas in kernel code and monitor accelerator-relevant value spectra. Second, we design dynamic probabilistic mutations to increase the chance of hitting divergent behavior under different platforms. Third, we memorize the boundaries of seen kernel inputs and skip HLS simulator invocation if it can expose only redundant divergent behavior. We evaluate HeteroFuzz on seven real-world heterogeneous applications with FPGA kernels. HeteroFuzz is 754X faster in exposing the same set of distinct divergence symptoms than naive fuzzing. Probabilistic mutations contribute to 17.5X speed up than the one without. Selective invocation of HLS simulation contributes to 8.8X speed up than the one without.
more » « less
Full Text Available
Efficient Fuzz Testing for Apache Spark Using Framework Abstraction

https://doi.org/10.1109/ICSE-Companion52605.2021.00036

Zhang, Qian; Wang, Jiyuan; Gulzar, Muhammad Ali; Padhye, Rohan; Kim, Miryung (May 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion))
null (Ed.)
Full Text Available
BigFuzz: efficient fuzz testing for data analytics using framework abstraction

https://doi.org/10.1145/3324884.3416641

Zhang, Qian; Wang, Jiyuan; Gulzar, Muhammad Ali; Padhye, Rohan; Kim, Miryung (December 2020, The 35th IEEE/ACM International Conference on Automated Software Engineering)
null (Ed.)
As big data analytics become increasingly popular, data-intensive scalable computing (DISC) systems help address the scalability issue of handling large data. However, automated testing for such data-centric applications is challenging, because data is often incomplete, continuously evolving, and hard to know a priori. Fuzz testing has been proven to be highly effective in other domains such as security; however, it is nontrivial to apply such traditional fuzzing to big data analytics directly for three reasons: (1) the long latency of DISC systems prohibits the applicability of fuzzing: naïve fuzzing would spend 98% of the time in setting up a test environment; (2) conventional branch coverage is unlikely to scale to DISC applications because most binary code comes from the framework implementation such as Apache Spark; and (3) random bit or byte level mutations can hardly generate meaningful data, which fails to reveal real-world application bugs. We propose a novel coverage-guided fuzz testing tool for big data analytics, called BigFuzz. The key essence of our approach is that: (a) we focus on exercising application logic as opposed to increasing framework code coverage by abstracting the DISC framework using specifications. BigFuzz performs automated source to source transformations to construct an equivalent DISC application suitable for fast test generation, and (b) we design schema-aware data mutation operators based on our in-depth study of DISC application error types. BigFuzz speeds up the fuzzing time by 78 to 1477X compared to random fuzzing, improves application code coverage by 20% to 271%, and achieves 33% to 157% improvement in detecting application errors. When compared to the state of the art that uses symbolic execution to test big data analytics, BigFuzz is applicable to twice more programs and can find 81% more bugs.
more » « less
Full Text Available